Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding partial eval to SDK #4240

Merged
merged 3 commits into from Mar 11, 2022
Merged

Conversation

kroekle
Copy link
Contributor

@kroekle kroekle commented Jan 18, 2022

Adding the ability to partially evaluate when using the the SDK as a go library. This allow for utilizing the existing OPA configuration (e.g. bundle, decisions, etc) when partially evaluating.

Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add a sign-off to your commit for the DCO check.

git commit --amend -s
git push -f

ought to do it

sdk/opa.go Show resolved Hide resolved
sdk/opa.go Outdated
@@ -298,6 +298,96 @@ func newDecisionResult() (*DecisionResult, error) {
return result, nil
}

// Partial returns a named decision. This function is threadsafe.
func (opa *OPA) Partial(ctx context.Context, options PartialOptions) (*DecisionResult, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's some way we can avoid duplicating all of the server/boilerplate, that would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try refactoring this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a shot at refactoring. Still trying to come up to speed with go, so not sure if it's the "go" way. Let me know if you see a better way to refactor.

sdk/opa.go Outdated
// The first interface being returned is the type that will be used for further processing
// The second interface is for the decision logger, most of the time these should be the same
type ResultResolver interface {
ProcessResults(pq *rego.PartialQueries) (interface{}, interface{}, error)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit uncertain about this interface because the system consuming the decison logs won't necessarily have access to the implementation so it won't be able to replay the policy decision. On the other hand, I could see how mapping the partial eval results is valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I started coding this up, it became apparent that not having the mapped results being logged the decisions became very difficult to understand. Mapping them to the context that the developer is more likely to understand, the decisions become far more valuable.

I have to admit I didn't think about the downstream affect that this could have on systems built to replay decisions. Maybe having contrib repos setup with the most common mappers would help these systems pull them in and support it better. (I had originally thought of including additional mappers in the SDK, but did not want to muddy up the OPA codebase)

sdk/opa.go Outdated

// The first interface being returned is the type that will be used for further processing
// The second interface is for the decision logger, most of the time these should be the same
type ResultResolver interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe instead of ResultResolver we call this something else like PartialQueryMapper which is a bit more accurate, i.e., the interface provides a mechanism for mapping partial evaluation results (which are exposed as rego.PartialQuery)... let's first decide on whether we need the interface in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't happy about the name, PartialQueryMapper is a much better name. I'll wait until we decide on the interface before I make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was in the code for the refactor, I did make this name change. Can still pull the entire thing out based on the outcome of the other thread.

sdk/opa.go Outdated
Input interface{} // specifies value of the input document to evaluate policy with
Query string // specifies the query to be partially evaluated
Unknowns []string // specifies the unknown elements of the policy
Mapper PartialQueryMapper // specifies the resolver to use when processing results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the word 'resolver' is leftover here. s/resolver/mapper/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like I got them all now

Copy link
Member

@tsandall tsandall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kroekle sorry for the delayed reply. I've left a few more comments after trying this out myself. I think we can keep the partial mapper stuff but I'd like to see a few changes. Let me know what you think.

sdk/opa.go Outdated
Unknowns []string // specifies the unknown elements of the policy
Resolver ResultResolver // specifies the resolver to use when processing results
Now time.Time // specifies wallclock time used for time.now_ns(), decision log timestamp, etc.
Path string // specifies name of policy decision to evaluate (e.g., example/allow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave the path out of the PartialOptions for now.

With the current implementation it's just being set on the server.Info struct and then ignored. It'll show up in the decision log, but other than that, it's not doing anything. For PE use cases, we still don't know how queries are generally going to look (e.g., are users always partially evaluating the full extent of a virtual document or do they want/need to express more sophisticated queries, e.g., data.foo.bar[x]) If we leave the path on the struct as-is, it's going to open up room for mistakes because users won't know how to set it correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed path

sdk/pr/RawResolver.go Outdated Show resolved Hide resolved
sdk/opa.go Outdated
Comment on lines 319 to 321
if options.Mapper == nil {
return nil, fmt.Errorf("PartialQueryMapper must be specified")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of returning an error, we could default to the RawMapper here. I think this would provide a better "getting started" experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an internal debate when I first coded this to do just that. I went ahead and made the change

sdk/opa.go Outdated
Input interface{} // specifies value of the input document to evaluate policy with
}
// Partial returns a named decision. This function is threadsafe.
func (opa *OPA) Partial(ctx context.Context, options PartialOptions) (*DecisionResult, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define a new struct for the Partial result. Something like:

type PartialResult struct {
  ID string // decision ID
  Result interface{} // mapped result
  AST rego.PartialQuery // raw result
}

This way the caller receives all of the information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this new struct. There are helper methods that were still using DecisionResult that I left (because they are also being used by Decision). If you think it looks messy, I can go ahead and refactor those as well.

sdk/opa.go Outdated
// The first interface being returned is the type that will be used for further processing
// The second interface is for the decision logger, most of the time these should be the same
type PartialQueryMapper interface {
MapResults(pq *rego.PartialQueries) (interface{}, interface{}, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to something like:

type PartialQueryMapper interface {
  MapResults(*rego.PartialQueries) (json.Marshaler, error)
}

The mapper will return an interface value that implements the MarshalJSON() function. This ensures that the SDK can marshal the result into the decision log record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to make this work, with my limited knowledge of go, since I didn't define the type, I don't think I can add the function (e.g. I wouldn't be able to add the function to rego.PartialQueries in the RawMapper). I could add the function directly to PartialQueryMapper, but then the type may be doing double work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of returning json.Marshaler, I decided to make a second function that specifically returns a type that can be logged as JSON. Let me know if you think this is enough.

sdk/opa.go Outdated
var recordResult interface{}
result.Result, recordResult, record.Error = options.Mapper.MapResults(pq)
if record.Error == nil {
record.Results = &recordResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store both the rego.PartialQueries value (which represents the reproducible partial decision) and the mapped result (which represents something the user will understand) here. This'll require an update to the EventV1 struct in the decision logger code (@ plugins/logs/plugin.go).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mad this change. I made a new property for mapped_result and left the PartialQueries in result, so hopefully this should play nice with management systems.

srenatus
srenatus previously approved these changes Feb 24, 2022
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anderseknert
Copy link
Member

anderseknert commented Mar 4, 2022

@kroekle can you please squash the commits and add a sign-off so that we may have this merged? 🙂

EDIT: oh, and rebase from latest main while you're at it :)

author Kurt Roekle <kroekle@gmail.com> 1640292949 -0600
committer Kurt Roekle <kroekle@gmail.com> 1646422506 -0600

Adding partial eval to SDK

Signed-off-by: Kurt Roekle <kroekle@gmail.com>

Adding printhook to partial call

Signed-off-by: Kurt Roekle <kroekle@gmail.com>

refactoring to clean up boilerplate code
renamed resolver to mapper

Signed-off-by: Kurt Roekle <kroekle@gmail.com>

Removing path from results
returning new PartialResult type from Partial
new ResultToJSON to ensure that mapped results can be logged as json
added mapped_result so that both PartialQueries and mapped can be logged

Signed-off-by: Kurt Roekle <kroekle@gmail.com>

Removing path from results
returning new PartialResult type from Partial
new ResultToJSON to ensure that mapped results can be logged as json
added mapped_result so that both PartialQueries and mapped can be logged

Signed-off-by: Kurt Roekle <kroekle@gmail.com>

Removing path from results
returning new PartialResult type from Partial
new ResultToJSON to ensure that mapped results can be logged as json
added mapped_result so that both PartialQueries and mapped can be logged

Signed-off-by: Kurt Roekle <kroekle@gmail.com>
@srenatus srenatus merged commit f42b2db into open-policy-agent:main Mar 11, 2022
@srenatus
Copy link
Contributor

@kroekle sorry for the long latency on this one 🐌 thanks again!

srenatus pushed a commit that referenced this pull request Mar 29, 2022
I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that #4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
damienjburks added a commit to damienjburks/opa that referenced this pull request Apr 1, 2022
Signed-off-by: Damien Burks <damien@damienjburks.com>

removing space from testcase

Signed-off-by: Damien Burks <damien@damienjburks.com>

website: make local dev and PR preview not build everything (open-policy-agent#4474)

With this change, the work done for local development, and the per-PR netlify preview changes:

It will no longer include the website stuff for versions other than edge, the current working tree.

We thus save us the time, and the flakiness, involved with

- checking if github has release binaries for all the versions
- checking out their sources
- fetching the release binaries to pre-hydrate old versions' live-blocks.

The previously-used, documented make target should still be intact.

Fixes open-policy-agent#4379 to some extent, I hope.

* docs/website: remove "latest" binary from opa versions cache

Having a stale binary here -- one called "latest" but not actually
being "latest" -- causes issues like this: when building the website
content for the (real) latest version, the script would take the
old (previous-latest) binary, and fail because that binary didn't
know the latest future keywords.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix download link redirect (open-policy-agent#4482)

Fixes the bug introduced in open-policy-agent#4474 causing the setup-opa action to fail
to retrieve the "latest" opa binary.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump github.com/yashtewari/glob-intersection to v0.1.0 (open-policy-agent#4481)

https://github.com/yashtewari/glob-intersection/releases/tag/v0.1.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/website: fix /docs redirect (open-policy-agent#4483)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix template to produce proper redirects (open-policy-agent#4484)

Also redirect /docs -> /docs/edge in preview

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/envoy: include new configurable gRPC msg sizes (open-policy-agent#4459)

Docs for open-policy-agent/opa-envoy-plugin#323

Signed-off-by: Elliot Maincourt <e.maincourt@gmail.com>

website/live-blocks: update caniuse/browserlist

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/live-blocks: don't call the github api to determine release asset urls

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: don't run generate twice (open-policy-agent#4488)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ast: Fix print call rewriting in else rules

The compiler was accidentally checking/rewriting print calls in the
body of else rules before the implicit args of the else rule were
processed. As a result, the compiler was generating false-positive
errors for refs to undeclared args. The root of the problem was the
usage of WalkBodies on the top-level rule (which implicitly walks the
bodies of else rules under the top-level rule). With this change, the
compiler will call WalkBodies on the head and body of each rule rather
than the entire rule itself (which includes the else chain).

Fixes open-policy-agent#4489

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

Update maintainer status, add Will Beason (open-policy-agent#4465)

Signed-off-by: Max Smythe <smythe@google.com>

Co-authored-by: Will Beason <willbeason@google.com>

Make maintainers list alphabetical (open-policy-agent#4491)

Signed-off-by: Max Smythe <smythe@google.com>

plugins/logs: Fix broken retry logic

We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: open-policy-agent#4486

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

logging: mask authorization header value in debug logs (open-policy-agent#4496)

Fixes open-policy-agent#4495

Signed-off-by: Anders Eknert <anders@eknert.com>

Fixes open-policy-agent#4376 (open-policy-agent#4494)

Do note though that this does not change how multiple
with statements are grouped. Although I agree with that,
it's IMHO a separate feature request, while the spacing
issue is a bug.

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime: improve log output for binary response (open-policy-agent#4498)

This change omits the response body when using compression on metrics endpoint and when pprof is enabled.

Signed-off-by: Christian Altamirano <christian.altamirano.ayala@gmail.com>

build: bump golang: 1.17 -> 1.18

No change to go.mod's `go` stanza, so no changes in code compatibility.

However, it's used for building our docker images and release
binaries, and for fuzz testing in our nightly workflow.

Some test-related changes with the dns lookup built-in function's
error handling; and the hardcoded signature. Running

    go test ./topdown -run TestTopdownJWTEncodeSignECWithSeedReturnsSameSignature -count 10000

makes me believe that for whatever reason the signature changed,
it's at least stable.

topdown/http_test: Test-only change to accomodate this change in Go (https://go.dev/doc/go1.18):

    Certificate.Verify now uses platform APIs to verify certificate
    validity on macOS and iOS when it is called with a nil
    VerifyOpts.Roots or when using the root pool returned from
    SystemCertPool.

We're keeping the old message for go <= 1.17; in a silly-simple way.

Also:

* ci: build and test two old golang version on macos|linux

  We'll drop golang 1.15, keep one unsupported version (1.16).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ci: remove go-fuzz, use native go 1.18 fuzzer

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(dev-deps): update live-scripts deps: mocha, nanoid (open-policy-agent#4500)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump nanoid in /docs/website/scripts/live-blocks (open-policy-agent#4501)

update bob_token and alice_token in markdown (open-policy-agent#4504)

Signed-off-by: yongen.pan <yongen.pan@daocloud.io>

topdown/eval: copy without modifying expr, update test/e2e/concurrency (open-policy-agent#4503)

What we previously did turned into a race condition with multiple
concurrent calls to /v1/compile.

With a change introduced with 0.38.0 (the `every` keyword), the
`nil` Terms of an `ast.Expr` node was surfaced: previously, it would
go unnoticed, but could potentially have yielded bad results.

The effect of this change is proven using a new e2e test that would
fail on the code we had previous.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

uploading latest changes

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing test cases

Signed-off-by: Damien Burks <damien@damienjburks.com>

sdk: avoid using different state (open-policy-agent#4505)

I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that open-policy-agent#4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>

build(deps): update opentelemetry-go: 1.5.0 -> 1.6.1 (metrics 0.28.0) (open-policy-agent#4467)

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.0
https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1
https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.6.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): open opentelemetry-go 1.6.0 -> 1.6.1 (open-policy-agent#4512)

This is a left-over from the previous bump, 0856120.

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

sdk/opa_test: increase max delta (0.5s) (open-policy-agent#4513)

When running on GHA, we've found this test to often fail on macos-latest:
It would not functionally be wrong, but it also wasn't able to finish in
the alloted time. Now, the maximum delta has been increased a lot (10ms to
500ms). It's much, but it's still good enough to ensure that the context
passed to Stop() is the one that matters for shutdown.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add clarifications for docs describing bundle signing and delta features

This commit adds some clarification to the bundle doc regarding
the format of bundle signatures and manifest behavior for delta
bundles.

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

runtime: change processed file event log level to info (open-policy-agent#4514)

"Processed file watch event." will now be logged at level "info" (was "warn").

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime/logging: only suppress payloads for handlers that compress responses (open-policy-agent#4502)

* runtime/logging: only suppress payloads for handlers that compress responses

To get compressed responses, two things need to be true:

1. The client must accept compressed responses
2. The handler must reply with a compressed response

For general API requests, (1) holds most of the time. (2) is only true
for the metrics endpoint at them moment, since the 3rd party library we
use for serving the prometheus endpoint will do compression.

* runtime/logging: remove dead code

The http.Hijack stuff was related to a watch feature removed in
open-policy-agent@186ef99

dropInputParam was only used by its tests.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: code cosmetics

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: allow implementations to override MakeDir

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

runtime+storage: integrate disk storage

With this change, the disk backend (badger) becomes available for
use with the OPA runtime properly:

It can be configured using the `storage.disk` key in OPA's config
(see included documentation).

When enabled,
- any data or policies stored with OPA will persist over restarts
- per-query metrics related to disk usage are reported
- Prometheus metrics per storage operation are exported

The main intention behind this feature is to optimize memory usage:
OPA can now operate on more data than fits into the allotted memory
resources. It is NOT meant to be used as a primary source of truth:
there are no backup/restore or desaster recovery procedures -- you
MUST secure the means to restore the data stored with OPA's disk
storage by yourself.

See also open-policy-agent#4014. Future improvements around bundle loading are
planned.

Some notes on details:

storage/disk: impose same locking regime used with inmem

With this setup, we'll ensure:

- there is only one open write txn at a time
- there are any number of open read txns at a time
- writes are blocked when reads are inflight
- during a commit (and triggers being run), no read txns can be created

This is to ensure the same atomic policy update semantics when using
'disk" as we have with "inmem". We're basically opting out of badger's
currency control and transactionality guarantees. This is because we
cannot piggy back on that to ensure the atomic update we want.

There might be other ways -- using subscribers, and blocking in some
other place -- but this one seems preferrable since it mirrors inmem.

Part of the problem is ErrTxnTooLarge, and committing and renewing
txns when it occurs: that, which is the prescribed solution to txns
growing too big, also means that reads can see half of the "logical"
transaction having been committed, while the rest is still getting
processed.

Another approach would have been using `WriteBatch`, but that won't
let us read from the batch, only apply Set and Delete operations.
We currently need to read (via an iterator) to figure out if we
need to delete keys to replace something in the store.  There is
no DropPrefix operation on the badger txn, or the WriteBatch API.

storage/disk: remove commit-and-renew-txn code for txn-too-big errors

This would break transactional guarantees we care about: while there
can be only one write transaction at a time, read transactions may
happen while a write txn is underway -- with this commit-and-reset
logic, those would read partial data.

Now, the error will be returned to the caller. The maximum txn size
depends on the size of memtables, and could be tweaked manually.
In general, the caller should try to push multiple smaller increments
of the data.

storage/disk: implement noop MakeDir

The MakeDir operation as implemented in the backend-agnostic storage
code has become an issue with the disk store: to write /foo/bar/baz,
we'd have to read /foo (among other subdirs), and that can be _much_
work for the disk backend. With inmem, it's cheap, so this wasn't
problematic before.

Some of the storage/disk/txn.go logic had to be adjusted to properly
do the MakeDir steps implicitly.

The index argument addition to patch() in storage/disk/txn.go was
necessary to keep the error messages conforming to the previous
code path: previously, conflicts (arrays indexed as objects) would
be surfaced in the MakeDir step, now it's entangled with the patch
calculation.

storage/disk: check ctx.Err() in List/Get operations

This won't abort reading a single key, but it will abort iterations.

storage/disk: support patterns in partitions

There is a potential clash here: "*", the path wildcard, is
a valid path section. However, it only affects the case when
a user would want to have a partition at

    /foo/*/bar

and would really mean "*", and not the wildcard.

Storing data at /foo/*/bar with a literal "*" won't be treated
differently than storing something at /fo/xyz/bar.

storage/disk: keep per-txn-type histograms of stats

This is done by reading off the metrics on commit, and shovelling
their numbers into the prometheus collector.

NOTE: if you were to share a metrics object among multiple transactions,
the results would be skewed, as it's not reset. However, our server
handlers don't do that.

storage/disk: opt out of badger's conflict detection

With only one write transaction in flight at any time, the situation
that badger guards against cannot happen:

A transaction has written to a key after the current, to-be-committed
transaction has last read that key from the store.

Since it can't happen, we can ignore the bookkeeping involved. This
improves the time it takes to overwrite existing keys.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/management-bundles: remove quotations with info boxes (open-policy-agent#4517)

Small follow-up to open-policy-agent#4515, using info boxes instead of generic markup.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix community site (open-policy-agent#4522)

* remove hugo templating

This page doesn't go through hugo at all, so these things would
end up on the website.

A quick fix is reverting the bits that added the templating. The
downside is that local dev and PR previews will link to
/docs/latest/ecosystem, which doesn't exist there.

* link Documentation and Download to /docs

For /docs/ (not /docs/latest) we have a redirect in local dev and
PR preview mode:

    /docs -> /docs/edge

so this will be a little less broken: only ecosystem is a 404 link
on the preview.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

integrations: Add Emissary-Ingress (open-policy-agent#4523)

This PR adds the Emissary ingress to the Ecosystem page.
The blog section contains link to an article which guides on Emissary ingress and OPA integrations.

Signed-off-by: Tayyab J <tayyab.jamadar@styra.com>

storage/disk: wildcard partition validation, docs caveat (open-policy-agent#4519)

A bunch of smaller follow-up tasks to open-policy-agent#4381.

* storage/disk_test: check invalid patches with wildcard partition, too
* docs/disk: add caveat re: bundles loaded into memory
* storage/disk: auto-manage /system partitions

If these are found in the user-provided partitions, we'll error out.

* storage/disk: pretty-print partitions with "*" instead of %2A
* storage/disk: respect wildcard-replacement in partition validation

It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

* storage/disk: check prefix when adding wildcard partitions

The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/style: fix code block bottom margin (open-policy-agent#4526)

The text following regular code blocks was too close. Now, the bottom
margin is the same was used with live-blocks, 1.5rem (24px).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add CloudFormation Hook tutorial (open-policy-agent#4525)

Obviously missing is deployment of OPA to AWS. This will come in the next iteration.

Signed-off-by: Anders Eknert <anders@eknert.com>

Prepare v0.39.0 release (open-policy-agent#4524)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

Prepare v0.40.0 development (open-policy-agent#4527)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix styra support link

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

adopters: Add Wealthsimple (open-policy-agent#4530)

Signed-off-by: hannahyeates <hannahgrey06@gmail.com>

ci: fix rego check (open-policy-agent#4532)

* build/policies: format using 0.39.0
* workflow/pull-request: use edge opa for rego PR checks

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: remove right margin on sidebar (open-policy-agent#4529)

* remove right margin on sidebar
* replace the removed margin with padding

Signed-off-by: orweis <orweis@gmail.com>

website: Remove unused variables to avoid error in strict mode(open-policy-agent#4534)

To Fix:
```
2 errors occurred:
policy.rego:13: rego_compile_error: assigned var header unused
policy.rego:13: rego_compile_error: assigned var signature unused
```

Signed-off-by: panpan0000 <panpan0000@msn.com>

website: show yellow banner for old version (open-policy-agent#4533)

Hopefully this is gets more attention than the yellow "older" button
in the version picker.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

fixing disk diagnostic test case
anderseknert pushed a commit to damienjburks/opa that referenced this pull request Apr 5, 2022
Signed-off-by: Damien Burks <damien@damienjburks.com>

removing space from testcase

Signed-off-by: Damien Burks <damien@damienjburks.com>

website: make local dev and PR preview not build everything (open-policy-agent#4474)

With this change, the work done for local development, and the per-PR netlify preview changes:

It will no longer include the website stuff for versions other than edge, the current working tree.

We thus save us the time, and the flakiness, involved with

- checking if github has release binaries for all the versions
- checking out their sources
- fetching the release binaries to pre-hydrate old versions' live-blocks.

The previously-used, documented make target should still be intact.

Fixes open-policy-agent#4379 to some extent, I hope.

* docs/website: remove "latest" binary from opa versions cache

Having a stale binary here -- one called "latest" but not actually
being "latest" -- causes issues like this: when building the website
content for the (real) latest version, the script would take the
old (previous-latest) binary, and fail because that binary didn't
know the latest future keywords.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix download link redirect (open-policy-agent#4482)

Fixes the bug introduced in open-policy-agent#4474 causing the setup-opa action to fail
to retrieve the "latest" opa binary.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump github.com/yashtewari/glob-intersection to v0.1.0 (open-policy-agent#4481)

https://github.com/yashtewari/glob-intersection/releases/tag/v0.1.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/website: fix /docs redirect (open-policy-agent#4483)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix template to produce proper redirects (open-policy-agent#4484)

Also redirect /docs -> /docs/edge in preview

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/envoy: include new configurable gRPC msg sizes (open-policy-agent#4459)

Docs for open-policy-agent/opa-envoy-plugin#323

Signed-off-by: Elliot Maincourt <e.maincourt@gmail.com>

website/live-blocks: update caniuse/browserlist

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/live-blocks: don't call the github api to determine release asset urls

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: don't run generate twice (open-policy-agent#4488)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ast: Fix print call rewriting in else rules

The compiler was accidentally checking/rewriting print calls in the
body of else rules before the implicit args of the else rule were
processed. As a result, the compiler was generating false-positive
errors for refs to undeclared args. The root of the problem was the
usage of WalkBodies on the top-level rule (which implicitly walks the
bodies of else rules under the top-level rule). With this change, the
compiler will call WalkBodies on the head and body of each rule rather
than the entire rule itself (which includes the else chain).

Fixes open-policy-agent#4489

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

Update maintainer status, add Will Beason (open-policy-agent#4465)

Signed-off-by: Max Smythe <smythe@google.com>

Co-authored-by: Will Beason <willbeason@google.com>

Make maintainers list alphabetical (open-policy-agent#4491)

Signed-off-by: Max Smythe <smythe@google.com>

plugins/logs: Fix broken retry logic

We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: open-policy-agent#4486

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

logging: mask authorization header value in debug logs (open-policy-agent#4496)

Fixes open-policy-agent#4495

Signed-off-by: Anders Eknert <anders@eknert.com>

Fixes open-policy-agent#4376 (open-policy-agent#4494)

Do note though that this does not change how multiple
with statements are grouped. Although I agree with that,
it's IMHO a separate feature request, while the spacing
issue is a bug.

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime: improve log output for binary response (open-policy-agent#4498)

This change omits the response body when using compression on metrics endpoint and when pprof is enabled.

Signed-off-by: Christian Altamirano <christian.altamirano.ayala@gmail.com>

build: bump golang: 1.17 -> 1.18

No change to go.mod's `go` stanza, so no changes in code compatibility.

However, it's used for building our docker images and release
binaries, and for fuzz testing in our nightly workflow.

Some test-related changes with the dns lookup built-in function's
error handling; and the hardcoded signature. Running

    go test ./topdown -run TestTopdownJWTEncodeSignECWithSeedReturnsSameSignature -count 10000

makes me believe that for whatever reason the signature changed,
it's at least stable.

topdown/http_test: Test-only change to accomodate this change in Go (https://go.dev/doc/go1.18):

    Certificate.Verify now uses platform APIs to verify certificate
    validity on macOS and iOS when it is called with a nil
    VerifyOpts.Roots or when using the root pool returned from
    SystemCertPool.

We're keeping the old message for go <= 1.17; in a silly-simple way.

Also:

* ci: build and test two old golang version on macos|linux

  We'll drop golang 1.15, keep one unsupported version (1.16).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ci: remove go-fuzz, use native go 1.18 fuzzer

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(dev-deps): update live-scripts deps: mocha, nanoid (open-policy-agent#4500)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump nanoid in /docs/website/scripts/live-blocks (open-policy-agent#4501)

update bob_token and alice_token in markdown (open-policy-agent#4504)

Signed-off-by: yongen.pan <yongen.pan@daocloud.io>

topdown/eval: copy without modifying expr, update test/e2e/concurrency (open-policy-agent#4503)

What we previously did turned into a race condition with multiple
concurrent calls to /v1/compile.

With a change introduced with 0.38.0 (the `every` keyword), the
`nil` Terms of an `ast.Expr` node was surfaced: previously, it would
go unnoticed, but could potentially have yielded bad results.

The effect of this change is proven using a new e2e test that would
fail on the code we had previous.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

uploading latest changes

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing test cases

Signed-off-by: Damien Burks <damien@damienjburks.com>

sdk: avoid using different state (open-policy-agent#4505)

I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that open-policy-agent#4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>

build(deps): update opentelemetry-go: 1.5.0 -> 1.6.1 (metrics 0.28.0) (open-policy-agent#4467)

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.0
https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1
https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.6.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): open opentelemetry-go 1.6.0 -> 1.6.1 (open-policy-agent#4512)

This is a left-over from the previous bump, 0856120.

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

sdk/opa_test: increase max delta (0.5s) (open-policy-agent#4513)

When running on GHA, we've found this test to often fail on macos-latest:
It would not functionally be wrong, but it also wasn't able to finish in
the alloted time. Now, the maximum delta has been increased a lot (10ms to
500ms). It's much, but it's still good enough to ensure that the context
passed to Stop() is the one that matters for shutdown.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add clarifications for docs describing bundle signing and delta features

This commit adds some clarification to the bundle doc regarding
the format of bundle signatures and manifest behavior for delta
bundles.

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

runtime: change processed file event log level to info (open-policy-agent#4514)

"Processed file watch event." will now be logged at level "info" (was "warn").

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime/logging: only suppress payloads for handlers that compress responses (open-policy-agent#4502)

* runtime/logging: only suppress payloads for handlers that compress responses

To get compressed responses, two things need to be true:

1. The client must accept compressed responses
2. The handler must reply with a compressed response

For general API requests, (1) holds most of the time. (2) is only true
for the metrics endpoint at them moment, since the 3rd party library we
use for serving the prometheus endpoint will do compression.

* runtime/logging: remove dead code

The http.Hijack stuff was related to a watch feature removed in
open-policy-agent@186ef99

dropInputParam was only used by its tests.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: code cosmetics

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: allow implementations to override MakeDir

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

runtime+storage: integrate disk storage

With this change, the disk backend (badger) becomes available for
use with the OPA runtime properly:

It can be configured using the `storage.disk` key in OPA's config
(see included documentation).

When enabled,
- any data or policies stored with OPA will persist over restarts
- per-query metrics related to disk usage are reported
- Prometheus metrics per storage operation are exported

The main intention behind this feature is to optimize memory usage:
OPA can now operate on more data than fits into the allotted memory
resources. It is NOT meant to be used as a primary source of truth:
there are no backup/restore or desaster recovery procedures -- you
MUST secure the means to restore the data stored with OPA's disk
storage by yourself.

See also open-policy-agent#4014. Future improvements around bundle loading are
planned.

Some notes on details:

storage/disk: impose same locking regime used with inmem

With this setup, we'll ensure:

- there is only one open write txn at a time
- there are any number of open read txns at a time
- writes are blocked when reads are inflight
- during a commit (and triggers being run), no read txns can be created

This is to ensure the same atomic policy update semantics when using
'disk" as we have with "inmem". We're basically opting out of badger's
currency control and transactionality guarantees. This is because we
cannot piggy back on that to ensure the atomic update we want.

There might be other ways -- using subscribers, and blocking in some
other place -- but this one seems preferrable since it mirrors inmem.

Part of the problem is ErrTxnTooLarge, and committing and renewing
txns when it occurs: that, which is the prescribed solution to txns
growing too big, also means that reads can see half of the "logical"
transaction having been committed, while the rest is still getting
processed.

Another approach would have been using `WriteBatch`, but that won't
let us read from the batch, only apply Set and Delete operations.
We currently need to read (via an iterator) to figure out if we
need to delete keys to replace something in the store.  There is
no DropPrefix operation on the badger txn, or the WriteBatch API.

storage/disk: remove commit-and-renew-txn code for txn-too-big errors

This would break transactional guarantees we care about: while there
can be only one write transaction at a time, read transactions may
happen while a write txn is underway -- with this commit-and-reset
logic, those would read partial data.

Now, the error will be returned to the caller. The maximum txn size
depends on the size of memtables, and could be tweaked manually.
In general, the caller should try to push multiple smaller increments
of the data.

storage/disk: implement noop MakeDir

The MakeDir operation as implemented in the backend-agnostic storage
code has become an issue with the disk store: to write /foo/bar/baz,
we'd have to read /foo (among other subdirs), and that can be _much_
work for the disk backend. With inmem, it's cheap, so this wasn't
problematic before.

Some of the storage/disk/txn.go logic had to be adjusted to properly
do the MakeDir steps implicitly.

The index argument addition to patch() in storage/disk/txn.go was
necessary to keep the error messages conforming to the previous
code path: previously, conflicts (arrays indexed as objects) would
be surfaced in the MakeDir step, now it's entangled with the patch
calculation.

storage/disk: check ctx.Err() in List/Get operations

This won't abort reading a single key, but it will abort iterations.

storage/disk: support patterns in partitions

There is a potential clash here: "*", the path wildcard, is
a valid path section. However, it only affects the case when
a user would want to have a partition at

    /foo/*/bar

and would really mean "*", and not the wildcard.

Storing data at /foo/*/bar with a literal "*" won't be treated
differently than storing something at /fo/xyz/bar.

storage/disk: keep per-txn-type histograms of stats

This is done by reading off the metrics on commit, and shovelling
their numbers into the prometheus collector.

NOTE: if you were to share a metrics object among multiple transactions,
the results would be skewed, as it's not reset. However, our server
handlers don't do that.

storage/disk: opt out of badger's conflict detection

With only one write transaction in flight at any time, the situation
that badger guards against cannot happen:

A transaction has written to a key after the current, to-be-committed
transaction has last read that key from the store.

Since it can't happen, we can ignore the bookkeeping involved. This
improves the time it takes to overwrite existing keys.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/management-bundles: remove quotations with info boxes (open-policy-agent#4517)

Small follow-up to open-policy-agent#4515, using info boxes instead of generic markup.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix community site (open-policy-agent#4522)

* remove hugo templating

This page doesn't go through hugo at all, so these things would
end up on the website.

A quick fix is reverting the bits that added the templating. The
downside is that local dev and PR previews will link to
/docs/latest/ecosystem, which doesn't exist there.

* link Documentation and Download to /docs

For /docs/ (not /docs/latest) we have a redirect in local dev and
PR preview mode:

    /docs -> /docs/edge

so this will be a little less broken: only ecosystem is a 404 link
on the preview.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

integrations: Add Emissary-Ingress (open-policy-agent#4523)

This PR adds the Emissary ingress to the Ecosystem page.
The blog section contains link to an article which guides on Emissary ingress and OPA integrations.

Signed-off-by: Tayyab J <tayyab.jamadar@styra.com>

storage/disk: wildcard partition validation, docs caveat (open-policy-agent#4519)

A bunch of smaller follow-up tasks to open-policy-agent#4381.

* storage/disk_test: check invalid patches with wildcard partition, too
* docs/disk: add caveat re: bundles loaded into memory
* storage/disk: auto-manage /system partitions

If these are found in the user-provided partitions, we'll error out.

* storage/disk: pretty-print partitions with "*" instead of %2A
* storage/disk: respect wildcard-replacement in partition validation

It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

* storage/disk: check prefix when adding wildcard partitions

The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/style: fix code block bottom margin (open-policy-agent#4526)

The text following regular code blocks was too close. Now, the bottom
margin is the same was used with live-blocks, 1.5rem (24px).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add CloudFormation Hook tutorial (open-policy-agent#4525)

Obviously missing is deployment of OPA to AWS. This will come in the next iteration.

Signed-off-by: Anders Eknert <anders@eknert.com>

Prepare v0.39.0 release (open-policy-agent#4524)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

Prepare v0.40.0 development (open-policy-agent#4527)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix styra support link

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

adopters: Add Wealthsimple (open-policy-agent#4530)

Signed-off-by: hannahyeates <hannahgrey06@gmail.com>

ci: fix rego check (open-policy-agent#4532)

* build/policies: format using 0.39.0
* workflow/pull-request: use edge opa for rego PR checks

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: remove right margin on sidebar (open-policy-agent#4529)

* remove right margin on sidebar
* replace the removed margin with padding

Signed-off-by: orweis <orweis@gmail.com>

website: Remove unused variables to avoid error in strict mode(open-policy-agent#4534)

To Fix:
```
2 errors occurred:
policy.rego:13: rego_compile_error: assigned var header unused
policy.rego:13: rego_compile_error: assigned var signature unused
```

Signed-off-by: panpan0000 <panpan0000@msn.com>

website: show yellow banner for old version (open-policy-agent#4533)

Hopefully this is gets more attention than the yellow "older" button
in the version picker.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

fixing disk diagnostic test case

removing additional space

Signed-off-by: Damien Burks <damien@damienjburks.com>

removing test files

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing linting issue

Signed-off-by: Damien Burks <damien@damienjburks.com>

adding test case for logging

Signed-off-by: Damien Burks <damien@damienjburks.com>

ecosystem: adding permit.io (open-policy-agent#4531)

Signed-off-by: Oz Radiano <oz@permit.io>

docs: CloudFormation fixes (open-policy-agent#4541)

* Fix step headers to use h3
* Add warning about boolean values conversion

Signed-off-by: Anders Eknert <anders@eknert.com>
rokkiter pushed a commit to rokkiter/opa that referenced this pull request Apr 18, 2022
I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that open-policy-agent#4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants